Search CORE

105 research outputs found

Evaluación y análisis de una aproximación a la fusión sensorial neuronal mediante el uso de sensores pulsantes de visión / audio y redes neuronales de convolución

Author: Ríos Navarro José Antonio
Publication venue
Publication date: 19/07/2017
Field of study

En este trabajo se pretende avanzar en el conocimiento y posibles implementaciones hardware de los mecanismos de Deep Learning, así como el uso de la fusión sensorial de forma eficiente utilizando dichos mecanismos. Para empezar, se realiza un análisis y estudio de los lenguajes de programación paralela actuales, así como de los mecanismos de Deep Learning para la fusión sensorial de visión y audio utilizando sensores neuromórficos para el uso en plataformas de FPGA. A partir de estos estudios, se proponen en primer lugar soluciones implementadas en OpenCL así como en hardware dedicado, descrito en systemverilog, para la aceleración de algoritmos de Deep Learning comenzando con el uso de un sensor de visión como entrada. Se analizan los resultados y se realiza una comparativa entre ellos. A continuación se añade un sensor de audio y se proponen mecanismos estadísticos clásicos, que sin ofrecer capacidad de aprendizaje, permiten integrar la información de ambos sensores, analizando los resultados obtenidos junto con sus limitaciones. Como colofón de este trabajo, para dotar al sistema de la capacidad de aprendizaje, se utilizan mecanismos de Deep Learning, en particular las CNN1, para fusionar la información audiovisual y entrenar el modelo para desarrollar una tarea específica. Al final se evalúa el rendimiento y eficiencia de dichos mecanismos obteniendo conclusiones y unas proposiciones de mejora que se dejarán indicadas para ser implementadas como trabajos futuros.In this work it is intended to advance on the knowledge and possible hardware implementations of the Deep Learning mechanisms, as well as on the use of sensory fusión efficiently using such mechanisms. At the beginning, it is performed an analysis and study of the current parallel programing, furthermore of the Deep Learning mechanisms for audiovisual sensory fusion using neuromorphic sensor on FPGA platforms. Based on these studies, first of all it is proposed solution implemented on OpenCL as well as dedicated hardware, described on systemverilog, for the acceleration of Deep Learning algorithms, starting with the use of a vision sensor as input. The results are analysed and a comparison between them has been made. Next, an audio sensor is added and classic statistical mechanisms are proposed, which, without providing learning capacity, allow the integration of information from both sensors, analysing the results obtained along with their limitations. Finally, in order to provide the system with learning capacity, Deep Learning mechanisms, in particular CNN, are used to merge audiovisual information and train the model to develop a specific task. In the end, the performance and efficiency of these mechanisms have been evaluated, obtaining conclusions and proposing improvements that will be indicated to be implemented as future works

idUS. Depósito de Investigación Universidad de Sevilla

Retinal ganglion cell software and FPGA model implementation for object detection and tracking

Author: Delbrück Tobias
Linares Barranco Alejandro
Moeys Diederick P.
Ríos Navarro José Antonio
Publication venue: IEEE Computer Society
Publication date: 01/01/2016
Field of study

This paper describes the software and FPGA implementation of a Retinal Ganglion Cell model which detects moving objects. It is shown how this processing, in conjunction with a Dynamic Vision Sensor as its input, can be used to extrapolate information about object position. Software-wise, a system based on an array of these of RGCs has been developed in order to obtain up to two trackers. These can track objects in a scene, from a still observer, and get inhibited when saccadic camera motion happens. The entire processing takes on average 1000 ns/event. A simplified version of this mechanism, with a mean latency of 330 ns/event, at 50 MHz, has also been implemented in a Spartan6 FPGA.European Commission FP7-ICT-600954Ministerio de Economía y Competitividad TEC2012-37868-C04-02Junta de Andalucía P12-TIC-130

idUS. Depósito de Investigación Universidad de Sevilla

Live Demonstration: Retinal ganglion cell software and FPGA implementation for object detection and tracking

Author: Delbrück Tobias
Linares Barranco Alejandro
Moeys Diederick P.
Ríos Navarro José Antonio
Publication venue: IEEE Computer Society
Publication date: 01/01/2016
Field of study

This demonstration shows how object detection and tracking are possible thanks to a new implementation which takes inspiration from the visual processing of a particular type of ganglion cell in the retina

idUS. Depósito de Investigación Universidad de Sevilla

Efficient DMA transfers management on embedded Linux PSoC for Deep-Learning gestures recognition: Using Dynamic Vision Sensor and NullHop one-layer CNN accelerator to play RoShamBo

Author: Jiménez Moreno Gabriel
Linares Barranco Alejandro
Ríos Navarro José Antonio
Tapiador Morales Ricardo
Publication venue: ACM Digital Library
Publication date: 01/01/2019
Field of study

This demonstration shows a Dynamic Vision Sensor able to capture visual motion at a speed equivalent to a highspeed camera (20k fps). The collected visual information is presented as normalized histogram to a CNN accelerator hardware, called NullHop, that is able to process a pre-trained CNN to play Roshambo against a human. The CNN designed for this purpose consist of 5 convolutional layers and a fully connected layer. The latency for processing one histogram is 8ms. NullHop is deployed on the FPGA fabric of a PSoC from Xilinx, the Zynq 7100, which is based on a dual-core ARM computer and a Kintex-7 with 444K logic cells, integrated in the same chip. ARM computer is running Linux and a specific C++ controller is running the whole demo. This controller runs at user space in order to extract the maximum throughput thanks to an efficient use of the AXIStream, based of DMA transfers. This short delay needed to process one visual histogram, allows us to average several consecutive classification outputs. Therefore, it provides the best estimation of the symbol that the user presents to the visual sensor. This output is then mapped to present the winner symbol within the 60ms latency that the brain considers acceptable before thinking that there is a trick.Ministerio de Economía y Competitividad TEC2016-77785-

idUS. Depósito de Investigación Universidad de Sevilla

Neuromorphic Approach Sensitivity Cell Modeling and FPGA Implementation

Author: Delbruck Tobias
Linares Barranco Alejandro
Liu Hongjie
Moeys Diederick P.
Ríos Navarro José Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Neuromorphic engineering takes inspiration from biology to solve engineering problems using the organizing principles of biological neural computation. This field has demonstrated success in sensor based applications (vision and audition) as well in cognition and actuators. This paper is focused on mimicking an interesting functionality of the retina that is computed by one type of Retinal Ganglion Cell (RGC). It is the early detection of approaching (expanding) dark objects. This paper presents the software and hardware logic FPGA implementation of this approach sensitivity cell. It can be used in later cognition layers as an attention mechanism. The input of this hardware modeled cell comes from an asynchronous spiking Dynamic Vision Sensor, which leads to an end-to-end event based processing system. The software model has been developed in Java, and computed with an average processing time per event of 370 ns on a NUC embedded computer. The output firing rate for an approaching object depends on the cell parameters that represent the needed number of input events to reach the firing threshold. For the hardware implementation on a Spartan6 FPGA, the processing time is reduced to 160 ns/event with the clock running at 50 MHz.Ministerio de Economía y Competitividad TEC2016-77785-PUnión Europea FP7-ICT-60095

idUS. Depósito de Investigación Universidad de Sevilla

Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

Author: Kadetotad Deepak
Kim Minkyu
Linares Barranco Alejandro
Ríos Navarro José Antonio
Seo Jae-sun
Tapiador Morales Ricardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.Ministerio de Economía y Competitividad TEC2016-77785-

idUS. Depósito de Investigación Universidad de Sevilla

Spiking row-by-row FPGA Multi-kernel and Multi-layer Convolution Processor.

Author: Domínguez Morales Juan Pedro
Gutiérrez Galán Daniel
Linares Barranco Alejandro
Ríos Navarro José Antonio
Tapiador Morales Ricardo
Publication venue: IEEE Computer Society
Publication date: 01/01/2019
Field of study

Spiking convolutional neural networks have become a novel approach for machine vision tasks, due to the latency to process an input stimulus from a scene, and the low power consumption of these kind of solutions. Event-based systems only perform sum operations instead of sum of products of framebased systems. In this work an upgrade of a neuromorphic event-based convolution accelerator for SCNN, which is able to perform multiple layers with different kernel sizes, is presented. The system has a latency per layer from 1.44 μs to 9.98μs for kernel sizes from 1x1 to 7x7

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs

Author: Kadetotad Deepak
Kim Minkyu
Linares Barranco Alejandro
Ríos Navarro José Antonio
Seo Jae-Sun
Tapiador Morales Ricardo
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times

idUS. Depósito de Investigación Universidad de Sevilla

System based on inertial sensors for behavioral monitoring of wildlife

Author: Domínguez Morales Juan Pedro
Jiménez Fernández Ángel Francisco
Linares Barranco Alejandro
Ríos Navarro José Antonio
Tapiador Morales Ricardo
Publication venue: IEEE Computer Society
Publication date: 01/01/2015
Field of study

Sensors Network is an integration of multiples sensors in a system to collect information about different environment variables. Monitoring systems allow us to determine the current state, to know its behavior and sometimes to predict what it is going to happen. This work presents a monitoring system for semi-wild animals that get their actions using an IMU (inertial measure unit) and a sensor fusion algorithm. Based on an ARM-CortexM4 microcontroller this system sends data using ZigBee technology of different sensor axis in two different operations modes: RAW (logging all information into a SD card) or RT (real-time operation). The sensor fusion algorithm improves both the precision and noise interferences.Junta de Andalucía P12-TIC-130

idUS. Depósito de Investigación Universidad de Sevilla

Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator

Author: Amaya Rodríguez Claudio Antonio
Domínguez Morales Manuel Jesús
Jiménez Fernández Ángel Francisco
Linares Barranco Alejandro
Ríos Navarro José Antonio
Tapiador Morales Ricardo
Publication venue: IEEE Computer Society
Publication date: 01/01/2018
Field of study

Many FPGAs vendors have recently included embedded processors in their devices, like Xilinx with ARM-Cortex A cores, together with programmable logic cells. These devices are known as Programmable System on Chip (PSoC). Their ARM cores (embedded in the processing system or PS) communicates with the programmable logic cells (PL) using ARM-standard AXI buses. In this paper we analyses the performance of exhaustive data transfers between PS and PL for a Xilinx Zynq FPGA in a co-design real scenario for Convolutional Neural Networks (CNN) accelerator, which processes, in dedicated hardware, a stream of visual information from a neuromorphic visual sensor for classification. In the PS side, a Linux operating system is running, which recollects visual events from the neuromorphic sensor into a normalized frame, and then it transfers these frames to the accelerator of multi-layered CNNs, and read results, using an AXI-DMA bus in a per-layer way. As these kind of accelerators try to process information as quick as possible, data bandwidth becomes critical and maintaining a good balanced data throughput rate requires some considerations. We present and evaluate several data partitioning techniques to improve the balance between RX and TX transfer and two different ways of transfers management: through a polling routine at the userlevel of the OS, and through a dedicated interrupt-based kernellevel driver. We demonstrate that for longer enough packets, the kernel-level driver solution gets better timing in computing a CNN classification example. Main advantage of using kernel-level driver is to have safer solutions and to have tasks scheduling in the OS to manage other important processes for our application, like frames collection from sensors and their normalization.Ministerio de Economía y Competitividad TEC2016-77785-

idUS. Depósito de Investigación Universidad de Sevilla